Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote shell command correctly for Posix shells #13214

Closed
wants to merge 1 commit into from

Conversation

eschnett
Copy link
Contributor

The quoting is now safe for Posix sh-like shells.

The algorithm tries several ways to quote words (via single or double quotes or backslashes), and then uses a scoring method to choose the "best" result. The current scoring chooses the shortest string, and adds additional penalties for backslashes.

@@ -81,7 +81,7 @@ function shell_parse(raw::AbstractString, interp::Bool)
if done(s,k)
error("unterminated double quote")
end
if s[k] == '"' || s[k] == '$'
if s[k] == '"' || s[k] == '$' || s[k] == '\\'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to shell_parse isn't mentioned in the commit message. Intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Yes, this is intentional (see the "parse" in the title). Currently, quoted backslashes in double quotes are not correctly interpreted; they result in two backslashes: The string "\\" is parsed to yield \\ instead of \.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example? That should really be a separate bug fix with a standalone test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially since that fix will need to be incorporated into @nolta's port of shell_parse to Scheme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is just this string, "\\". I added test cases to spawn.jl as part of the patch. @nolta's port already has this change, if I read the Lisp correctly.

If you want, I can split this patch into two...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this:

julia> Base.shell_parse("\\")
ERROR: dangling backslash
 in shell_parse at ./shell.jl:63
 in shell_parse at ./shell.jl:117

What do you expect to get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the string "\\", which is represented as "\"\\\\\"" when input as Julia source. This string should lead to a single backslash. See e.g. in a shell:

echo "\\"
\

The first backslash escapes the second (this still happens inside double quotes), and shell_parse strips quotes around arguments.

In Julia:

str="\"\\\\\""
println(str)
Base.shell_split(str)
["\\"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the same thing – both produce a string containing one backslash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I type this into a current Julia, I receive an error instead of the last line. (You asked for the "expected" output.) The test cases I added to spawn.jl also report a disagreement where parsing yields too many backslashes -- sorry, I don't have the source string handy, it was more complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strict bug fix and unrelated to the rest of this change, so I've pulled it out and applied it here: cd59221.

@StefanKarpinski
Copy link
Member

So the main things that give me pause here are the increased complexity and the dependence on regular expressions for something so low level. In general, we try to avoid using regexes if possible since it would be nice someday for some minimal subset of Julia not to depend on PCRE. But maybe that is a quixotic endeavor. @JeffBezanson, what do you think about the increased regex usage here?

@eschnett
Copy link
Contributor Author

The complexity comes from trying to create string representations that are both correct and "nice". If we split this across two functions, then we can (a) keep the original function for "niceness" and (b) add a new five-line function that doesn't use regexps for correctness.

Also, the regexps are straightforward and could easily be replaced with a direct implementation. However, this would be significantly longer (maybe three or five times).

@StefanKarpinski
Copy link
Member

Am I correct that this is really just expanding the set of characters that are considered "special" relative to the original implementation?

@eschnett
Copy link
Contributor Author

Yes, in principle this just switches from treating just a few characters as special, to treating them all as special and using a white list for non-special characters. Of course, the detail about how to decide which way to quote is different.

I took an existing, well-tested Python implementation and converted it to Julia. Apart from translation errors the regexps should be correct.

The quoting is now safe for Posix sh-like shells.

The algorithm tries several ways to quote words (via single or double quotes or backslashes), and then uses a scoring method to choose the "best" result. The current scoring chooses the shortest string, and adds additional penalties for backslashes.
@eschnett eschnett changed the title Correctly parse and quote shell commands Quote shell command correctly for Posix shells Sep 18, 2015
@eschnett
Copy link
Contributor Author

This patch is ready to be applied, but we should probably wait until after #13215 has been applied.

Currently, the REPL shell mode uses this function (incorrectly assuming it does full Posix shell quoting), so this patch would currently have user-visible effects. I assume people are currently relying on the fact that such quoting doesn't happen, as this is what e.g. currently allows file redirection and pipelines there.

@vtjnash
Copy link
Member

vtjnash commented Mar 26, 2019

Superseded by #30636 (which added a "shell_escape_posixly" function that ensures quoting is safe for any posix shell, vs. the "shell_escape" function which pretty-prints Julia cmd syntax)

@vtjnash vtjnash closed this Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants